Fast Stepwise Regression on Linked Data
نویسندگان
چکیده
The main focus of research in machine learning and statistics is on building more advanced and complex models. However, in practice it is often much more important to use the right variables. One may hope that recent popularity of open data would allow researchers to easily find relevant variables. However current linked data methodology is not suitable for this purpose since the number of matching datasets is often overwhelming. This paper proposes a method using correlation based indexing of linked datasets which can significantly speed up feature selection based on classical stepwise regression procedure. The technique is efficient enough to be applied at interactive speed to huge collections of publicly available linked open data.
منابع مشابه
Stepwise regression for unsupervised learning
I consider unsupervised extensions of the fast stepwise linear regression algorithm [5]. These extensions allow one to efficiently identify highly-representative feature variable subsets within a given set of jointly distributed variables. This in turn allows for the efficient dimensional reduction of large data sets via the removal of redundant features. Fast search is effected here through th...
متن کاملLeast angle and l 1 penalized regression : A review ∗ †
Least Angle Regression is a promising technique for variable selection applications, offering a nice alternative to stepwise regression. It provides an explanation for the similar behavior of LASSO (l1-penalized regression) and forward stagewise regression, and provides a fast implementation of both. The idea has caught on rapidly, and sparked a great deal of research interest. In this paper, w...
متن کاملA Stepwise Regression Method and Consistent Model Selection for High-dimensional Sparse Linear Models by Ching-kang Ing
We introduce a fast stepwise regression method, called the orthogonal greedy algorithm (OGA), that selects input variables to enter a p-dimensional linear regression model (with p >> n, the sample size) sequentially so that the selected variable at each step minimizes the residual sum squares. We derive the convergence rate of OGA as m = mn becomes infinite, and also develop a consistent model ...
متن کاملCorrelated Component Regression: A Prediction/Classification Methodology for Possibly Many Features
A new ensemble dimension reduction regression technique, called Correlated Component Regression (CCR), is proposed that predicts the dependent variable based on K correlated components. For K = 1, CCR is equivalent to the corresponding Naïve Bayes solution, and for K = P, CCR is equivalent to traditional regression with P predictors. An optional step-down variable selection procedure provides a...
متن کاملBayesian and Iterative Maximum Likelihood Estimation of the Coefficients in Logistic Regression Analysis with Linked Data
This paper considers logistic regression analysis with linked data. It is shown that, in logistic regression analysis with linked data, a finite mixture of Bernoulli distributions can be used for modeling the response variables. We proposed an iterative maximum likelihood estimator for the regression coefficients that takes the matching probabilities into account. Next, the Bayesian counterpart...
متن کامل